DR-0002 Use syntect
to extract doc comments from code
Manuel Woelker 2025-06-22T14:21:06+00:00
Status: Approved
Date: 2025-06-19
Decision
To extract doc comments from source code, we will use the syntect crate.
Context
To extract doc comments from code, we need to find all the comments in the code, for various languages.
The requirements for this extractor were:
- Wide support for various programming language formats
- Robustness against invalid code/syntax
- Good performance
Consequences
syntect is used to extract doc comments from the code.
To support as many languages as possible, the two-face crate is used.
Considered Alternatives
Custom lexer
A custom lexer could be implemented to find comments. Due to the number of languages and the complexity of handling different syntaxes, this might not be a good idea. Especially handling "comment-like" syntax in strings would potentially mean having a custom lexer for each language.
tree-sitter
tree-sitter parsers could be used to extract the comments from source files.
The drawback is that these parsers need to be curated, are platform-specific and are relatively heavyweight.
inkjet
inkjet bundles ~70 tree-sitter parsers for various languages.
The downside of this approach is that all these parsers need to be compiled (making the compilation much slower) and bundled in the binary (making the binary much larger)